Spoken Content Retrieval Using Distance Combination and Spoken Term Detection Using Hash Function for NTCIR10 SpokenDoc2 Task
نویسندگان
چکیده
In this paper we describe a spoken content retrieval (SCR) and a spoken term detection (STD) which were used in the 2nd round of the IR (Information Retrieval) for Spoken Documents (SpokenDoc2) task. Our SCR method maps the target documents into multiple vector spaces, which include a word-based vector space for word-based speech recognition results and a syllable-based vector space for syllablebased speech recognition results. The syllable-based space is spanned by axes extracted using latent semantic indexing (LSI). We also apply query expansion and morpheme weighting to the word-based space. Finally, the distance between the query and the documents in each vector space are combined and ranked for retrieving the documents. On the other hand, our STD method extracts sub-sequences from the target documents and converts them into bit sequences using the hash function. The query is also converted into a bit sequence in the same way. Candidates are detected by calculating the hamming distance between the bit sequence of the query and that of the target documents. Then, our method calculates the distances between the query and the candidates using DP (Dynamic Programming) matching. To evaluate the proposed methods, we conducted spoken document retrieval experiments using the SpokenDoc task from the NTCIR-9 meeting. Using these experimental results to set our parameters, we submitted the results for the SpokenDoc2 task at NTCIR-10.
منابع مشابه
Spoken Term Detection by N-gram Index with Exact Distance for NTCIR-SpokenDoc2
For spoken term detection, it is very important to consider Out-of-Vocabulary (OOV). Therefore, sub-word unit based recognition and retrieval methods have been proposed. This paper describes a very fast Japanese spoken term detection system that is robust for considering OOV words. We used individual syllables as sub-word unit in continuous speech recognition and an n-gram index of syllables in...
متن کاملSTD Method Based on Hash Function for NTCIR11 SpokenQuery&Doc Task
In this paper, we describe a spoken term detection (STD) method which is used in Spoken Query and Documents task of NTCIR-11 meeting. Our STDmethod extracts sub-sequences from the syllable-based speech recognition candidates of the target speech and converts them into bit sequences using a hash function. The query is also converted into a bit sequence in the same way. Term detection candidates ...
متن کاملDTW-Distance-Ordered Spoken Term Detection and STD-based Spoken Content Retrieval: Experiments at NTCIR-10 SpokenDoc-2
In this paper, we report our experiments at NTCIR-10 SpokenDoc-2 task. We participated both the STD and SCR subtasks of SpokenDoc. For STD subtask, we applied novel indexing method, called metric subspace indexing, previously proposed by us. One of the distinctive advantages of the method was that it could output the detection results in increasing order of distance without using any predefined...
متن کاملSopoken Term Detection Based on a Syllable N-gram Index at the NTCIR-11 SpokenQuery&Doc Task
For spoken term detection, it is crucial to consider out-ofvocabulary (OOV) and the mis-recognition of spoken words. Therefore, various sub-word unit based recognition and retrieval methods have been proposed. We also proposed a distant n-gram indexing/retrieval method for spoken queries, which is based on a syllable n-gram and incorporates a distance metric in a syllable lattice. The distance ...
متن کاملSpoken Term Detection and Spoken Content Retrieval: Evaluations on NTCIR 11 SpokenQuery&Doc Task
In this paper, we report out experiments on NTCIR-11 SpokenDoc&Query task for spoken term detection (STD) and spoken content retrieval (SCR). In STD, we consider acoustic feature similarity between utterances over both word and sub-word lattices to deal with the general problem of open vocabulary retrieval with queries of variable length. In SCR, we modify term frequency using expected term fre...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013